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Abstract. State-space reduction techniques, used primarily in model-checkers, 
all rely on the idea that some actions are independent, hence could be taken in 
any (respective) order while put in parallel, without changing the semantics. It 
is thus not necessary to consider all execution paths in the interleaving seman- 
tics of a concurrent program, but rather some equivalence classes. The purpose 
of this paper is to describe a new algorithm to compute such equivalence classes, 
and a representative per class, which is based on ideas originating in algebraic 
topology. We introduce a geometric semantics of concurrent languages, where 
programs are interpreted as directed topological spaces, and study its properties 
in order to devise an algorithm for computing dihomotopy classes of execution 
paths. In particular, our algorithm is able to compute a control-flow graph for con- 
current programs, possibly containing loops, which is "as reduced as possible" in 
the sense that it generates traces modulo equivalence. A preliminary implemen- 
tation was achieved, showing promising results towards efficient methods to ana- 
lyze concurrent programs, with very promising results compared to partial-order 
reduction techniques. 

Introduction 

Formal verification of concurrent programs is traditionally considered as a difficult 
problem because it might involve checking all their possible schedulings, in order to 
verify all the behaviors the programs may exhibit. This is particularly the case for 
checking for liveness or reachability properties, or in the case of verification methods 
that imply traversal of some important parts of the graph of execution, such as model- 
checking [4] and abstract testing [6]. Fortunately, many of the possible executions are 
equivalent (we say dihomotopic) in the sense that one can be obtained from the other by 
permuting independent instructions, therefore giving rise to the same results. In order 
to analyze a program, it is thus enough (and much faster) to analyze one representative 
in each dihomotopy class of execution traces. 

We introduce in this paper a new algorithm to reduce the state-space explosion 
during the analysis of concurrent systems. It is based on former work of some of the 
authors, most notably [24] where the notion of trace space is introduced and studied, 
and also builds up considerably on the geometric semantics approach to concurrent 
systems, as developed in [13]. Some fundamentals of the mathematics involved can be 
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found in [19]. The main contributions of this article are the following: we develop and 
improve the algorithms for computing trace spaces of [24] by reformulating them in 
order to devise an efficient implementation for them, we generalize this algorithm to 
programs which may contain loops and thus exhibit an infinite number of behaviors, 
we apply these algorithms to a toy shared-memory language whose semantics is given 
in the style of [12], but in this paper, formulated in terms of d-spaces [19], and we 
report on the implementation and experimentation of our algorithms on trace spaces - 
an industrial case-study using those methods is also detailed in [3]. 

Stubborn sets [25], sleep sets and persistent sets [15] are among the most popu- 
lar methods used for diminishing the complexity of model-checking using transition 
systems; they are in particular used in SPIN [1], with which we compare our work ex- 
perimentally in Section 2.5. They are based on semantic observations using Petri nets 
in the first case and Mazurkiewicz trace theory in the other one. We believe that these 
are special forms of dihomotopy-based reduction as developed in this paper when cast 
in our geometric framework, using the adjunctions of [18]. Of course, the trace spaces 
we are computing have some acquaintance with traces as found in trace theory [7]: 
basically, traces in trace theory are points of trace spaces, and composition of traces 
modulo dihomotopy is concatenation in trace theory. Trace spaces are more general in 
that they consider general directed topological spaces and not just partially commutative 
monoids; they also include all information related to higher-dimensional (di-)homotopy 
categories, and not just the fundamental category, as in trace theory. Trace spaces are 
also linked with component categories, introduced by some of the authors [14,17], and 
connected components of trace spaces can also be computed using the algorithm intro- 
duced in [16]. 

Contents of the paper. We first define formally the programming language we are 
considering (Section 1.1) as well as an associated geometric semantics, (Section 1.2). 
We then introduce an algorithm for computing an effective combinatorial representation 
of trace spaces as well as an efficient implementation of it (Section 2), and extend this 
algorithm in order to handle program containing loops (Section 3). Finally, we discuss 
various applications, in particular to static analysis (Section 3.5) and possible extensions 
of the algorithm and conclude. 

1 Geometric semantics of concurrent processes 
1.1 A toy shared-memory concurrent language 

In this paper, we consider a toy imperative shared-memory concurrent language as 
grounds for experimentation. In this formalism, a program can be constituted of mul- 
tiple subprograms which are run in parallel. The environment provides a set of re- 
sources 1Z, where each resource a e 1Z can be used by at most n a subprograms at the 
same time, the integer n a e N being called the capacity of the resource a. In particular, 
a mutex is a resource of capacity 1. 

Whenever a program wants to access a resource a, it should acquire a lock by per- 
forming the action P a which allows access to a, if the lock is granted. Once it does 
not need the resource anymore, the program can release the lock by performing the 
action V a , following again the notation set up by Dijkstra [8]. If a subprogram tries to 
acquire a lock on a resource a when the resource has already been locked n a times, 



the subprogram is stuck until the resource is released by an other subprogram. In or- 
der to be realistic even though simple, the language considered here also comprises a 
sequential composition operator ., a non-deterministic choice operator + and a loop 
construct (— )*, with similar semantics as in regular languages (it should be thought as 
a while construct), as well as a parallel composition operator | to launch two subpro- 
grams in parallel. 

Programs p are defined by the following grammar: 

P ■■= 1 I Pa | V a | p.p | p\p | P + P | P* 

Programs are considered modulo a structural congruence = which imposes that oper- 
ators ., + and | are associative and admit 1 as neutral element. A thread is a program 
which does not contain the parallel composition operator |. 

1.2 Geometric semantics 

We introduce here a semantics based on (directed) topological spaces. The geometric 
semantics will allow a different representation of n pairwise independent actions (as the 
surface of an n-cube) and n truly concurrent actions as the full n-cube. 

We denote by I = [0,1] C R the standard euclidean interval. A path p in a topo- 
logical space X is a continuous map p : I — > X, and the points p(0) and p(l) are 
respectively called the source and target of the path. Given two paths p and q such that 
p(l) = q{0), we define their concatenation as the path p ■ q defined by 



A topological space can be equipped with a notion of "direction" as follows [19]: 

Definition 1. A directed topological space (or d-space for short) X = (X, dX) con- 
sists of a topological space X together with a set dX of paths in X (the directed paths) 
such that 

1. constant paths: every constant path is directed, 

2. reparametrization; dX is closed under precomposition with ( non necessarily sur- 
jective) increasing maps /—>•/, which are called reparametrizations, 

3. concatenation: dX is closed under concatenation. 

A morphism ofd-spaces f : X — > Y, a directed map, is a continuous function f : X — > Y 
which preserves directed paths, in the sense that ,f(dX) C dY. 

The category of d-spaces is complete and cocomplete [19]. This allows us to ab- 
stractly define some constructions on d-spaces, which extend usual constructions on 
topological spaces, that we detail here explicitly by describing the associated directed 
paths. 

- The terminal d-space * is the space reduced to one point. 

- The cartesian product X xY of two d-spaces X and Y has d(X x Y) = dX x dY. 

- The disjoint union XtfclY of two d-spaces X and Y is such that d(X^)Y) — dX^sdY. 




- The amalgamation X[x = y] of two points x and y in a d-space X is the d-space X 
where x and y have been identified, together with the expected set of directed paths. 

- Given a d-space X and a topological space 7 CI, the subspace Y can be canon- 
ically equipped with a structure of d-space by dY = {p£ dX / p(I) C Y}. 

The geometric semantics of a program is defined using those constructions as follows: 

Definition 2. To every program p, we associate a d-space G p together with a pair of 
points b p ,e p € G p , respectively called beginning and end, and a resource function 
r p : 1Z x G p — > Z which indicates the number of locks the program holds at a given 
point. The definition of these is done by induction on the structure ofp as follows: 





= *, ei = *, ri(a,x)=0 




G Pa = I, b, 
rp a (b,x) = 


o a = 0, ey a = 1, 
— 1 ifb = a and x > 
ifb^=aorx = 


Gv a = I, b 
r Va (b,x) = < 


v a = 0, e Va = 1, 
1 ifb — a and x = 1 
ifb^=aorx<l 


Gp. q = (Gp W I W G q )[e p = 0, 1 = b q ), 

bp.q — bp, &p.q — Gq, 

\r p (a,x) ifxeGp 
r p . q (a,x) = i " 

\r p (a,e p ) + r q (a,x) ifxeG q 


Gp+q = (Gp l+J G q )[bp = bq,e p = e q ], 

bp+q bp, &p+q G q , 

\r p (a,x) ifxeGp 
r p+q (a,x) = \ P J 
\r q (a,x) ifxeGq 


G p \ q — Gp x G q , 

bp\q = (bpjbq), &p\q — ( e p>&q)> 

r p \ q (a, (x, y)) = r p (a, x) + r q (a, y) 


G P ' = Gp[b p = e p ], 
bp* bp, Gp* bp, 
r p ,(a,x) = r p (a,x) 



Given a program p, the forbidden region is the d-space F p C G p defined by 

F p = {x e G p I 3a E 1Z, K a + r p (a, x) < or r p (a, x) > 0} 

The geometric realization of a process p, is defined as the d-space H p — G p \ F p . 

We sometimes write and 00 for the beginning and the end points respectively of a 
geometric realization, and say that a path p : I — > G p is total when it has as source 
and 00 as target. It is easy to show that the geometric semantics of a program is well- 
defined in the sense that two structurally congruent programs give rise to isomorphic 
geometric realizations. 

Example 1. The processes 

Pa-Va\Pa-Va P a .P b .V b .V a \P b .P a .V a .V b Pa.(V a .P a )* \P a .V a 

respectively have the following geometric realizations, which all consist of a space with 
some "holes", drawn in gray, induced by the forbidden region: 




The space in the middle is sometimes called the "Swiss flag" because of its form and is 
interesting because it exhibits both a deadlock and an unreachable region [13]. 



2 Computing trace spaces 



2.1 Trace spaces 

In topology, two paths p and q are often considered as equivalent when q can be ob- 
tained by deforming continuously p (or vice versa), this equivalence relation being 
called homotopy. The corresponding variant of this relation in the case of directed topo- 
logical spaces is called dihomotopy and is formally defined as follows. In the cate- 
gory of d-spaces, the object / is exponentiable, which means that for every d-space Y, 
one can associate a d-space Y 1 such that there is a natural bijection between mor- 
phisms X x I — >• Y and morphisms X — ► Y 1 . The underlying space of Y 1 is the set 
of functions I —}Y with the compact-open topology (also called uniform convergence 
topology), and the directed paths h : I —¥ Y 1 are the functions such that t H> h(t)(u) 
is increasing for every u £ I. Finally, two paths are said to be dihomotopic when one 
can be continuously deformed into the other: 

Definition 3. The dihomotopy is defined as the smallest equivalence relation on paths 
such that two directed paths p, q : I — > X are dihomotopic when there exists a directed 
path h : I — > X 1 with p as source and q as target. 

Example 2. In the geometric semantics of the program Pb.V .P a .V a \ P a -V a , the two 
paths above the hole are dihomotopic, whereas the path below is not dihomotopic to the 
two others: 

Pb V b P a V a 

The intuition underlying the geometric semantics is that two dihomotopic paths cor- 
respond to execution traces differing by inessential commutations of instructions, thus 
giving rise to the same result. 

Given two points x and y of a d-space X, we write X(x,y) for the subset of X 1 
consisting of dipaths from x to y. A trace is the equivalence class of a path modulo sur- 
jective reparametrization, and a scheduling is the equivalence class of a trace modulo 
dihomotopy. We write T(X)(x, y) for the trace space obtained from X(x, y) by iden- 
tifying paths equivalent up to reparametrization, and simply T(X) for T(X)(0, oo). In 
particular, we have T(X)(x,y) ^ if and only if there exists a directed path in X 
going from x to y. 

In this section, we reformulate the algorithm for computing the trace space T(X) up 
to dihomotopy equivalence, originally introduced in [24], in order to achieve an efficient 
implementation of it. For simplicity, we restrict here to spaces which are geometric 
realizations of programs of the form 

P = Pa I Pi I • • • I Pn-l (1) 

where the pi are built up only from 1, concatenation, resource locking and resource 
unlocking (extending the algorithm to programs which may contain loops requires sig- 
nificant generalizations which are described in Section 3). In this case, the geometric 
realization is of the form 



l-l 



I n \ \J R l 



i=0 



where I n denotes the cartesian product of n copies of /, and each R i = TYj=o Pj i & a 
rectangle. We suppose here that each R 1 is homothetic to the n-dimensional open rect- 
angle, i.e. each directed interval Jj is of the form P- =]x*-, and generalize this at 
the end of the section. The restrictions on the form of the programs are introduced here 
only to simplify our exposition: programs with choice can be handled by computing the 
trace spaces on each branch and program with loops can be handled by suitably unfold- 
ing the loops so that all the possible behaviors are exhibited (a detailed presentation of 
this is given in Section 3, which will enable to handle the full language). We suppose 
fixed a program with n threads and I forbidden open rectangles, and consistently use 
the notations above. 

Example 3. The geometric realization of the programs 

Pa.Va-Pb-Vb\Pa-Va-Pb-Vb and P a .V a .P b .V b \P b .V b .P a .V a 



are respectively 



x\ 
fx 



'o 1 'T^^O 

x o #o x o yo 



and 



cT^o i l r _> ^o 
o wo x o yo 



2.2 The index poset 

Let us come back to the second program of Example 3. We will determine the different 
traces, and their relationships in the trace space, by combinatorially looking at the way 
they can turn around holes. To see this in that example, we extend each hole in parallel 
to the axes, below or leftwards from the holes, until they reach the boundary of the state 
space. These new obstructions impose traces to go the other way around each hole: 
the existence of deadlocks, given these new constraints in the trace space allows us to 
determine whether traces going one way or the other around each hole exist. In fact, 
this combinatorial information precisely computes all of the trace space [24]. 

In the second program of Example 3, there are four possibilities to extend once each 
of the two holes: 




ti 

: to 



(2) 



Notice that there exists a total path in the first three spaces (as depicted above), whereas 
there is none in the last one. 



A simple way to encode the combinatorial information about the extension of holes 
is through boolean matrices. We write M.i j7l for the poset of I x n matrices, with I rows 
(the number of holes R % ) and n columns (the dimension of the space, i.e. the number 
of threads in the program), with coefficients in Z/2Z, with the pointwise ordering such 
that ^ 1: we have M < N whenever 

V(i,j)e[0:/[x[0:n[, M(i,j) < N(i,j) (3) 

where [to : n[ denotes the set {m, ... ,n — 1} of integers and M(i,j) denotes the 
(i,j)-th coefficient of M. We also write M.f" n for the subposet of A4i, n consisting of 
matrices whose row vectors are all different from the zero vector, and A4f n for the 
subposet of Mi,n consisting of matrices whose column vectors are all unit vectors 
(containing exactly one coefficient 1). 

Given a matrix M € Mi n > we define Xm as the subspace of X obtained by ex- 
tending downwards each forbidden rectangle R l in every direction j' different from j 
for every j such that M(i,j) = 1. Formally, 

Xm = I n \ [J Rj 

M(i,j) = l 

where Rj = Y\j72o[ Q ^yj'[ x ] x pVj[ x ll"'=j+i [°> Vy [> see ( 2 ) and Example 4 below. 

In order to study whether there is a total path in the space associated to a matrix, we 
define a map V : Mt,„ ->• Z/2Z by &(M) = 1 iff f(X M ) = 0, i.e. there is no total 
path in Xm- A matrix M is dead when \P(M) = 1 and alive otherwise. The map ^ can 
easily be shown to be order preserving. 

Definition 4. We write 

V(X) = {M e Mf n j #(M) = 1} 
for the set of (column) dead matrices and 

C(X) = {M e Mf; n I !?(M) - 0} 

for the set of alive matrices (with non-empty rows), which is called the index poset - it 
is implicitly ordered by the relation (3). 

Example 4. In the example above, the three extensions of holes (2) are respectively 
encoded by the following matrices: 

Go) (?;) (so go 

The last matrix is dead and the three others are alive. The last matrix being dead indi- 
cates that there is no way a trace can pass left of the upper left hole and carry on passing 
below the lower right hole. 

A reason why the matrices in the index poset are convenient objects to study the 
schedulings is that they are topologically very simple [24]: 



Proposition 1. For any matrix M G Mf- n , the space Xm(x,u) is either empty or 
contractible: any two paths with the same source x and target y are dihomotopic. In 
particular, for any matrix M G C(X), the space X M (0, oo) is always contractible. 

Our main interest in the index poset is that it enables us to compute the schedulings 
(i.e. maximal paths modulo dihomotopy) of the space: these schedulings are in bijection 
with alive matrices in C(X) modulo an equivalence relation called connexity, which is 
defined as follows. Given two matrices M, N <E A4i, n , their intersection M A N is 
defined as the matrix MAN such that (M A N)(i,j) = mm(M(i,j),N(i,j)). 

Definition 5. Two matrices M and N are connected when their intersection does not 
contain any row filled with 0. 

The dihomotopy classes of total paths in X can finally be computed thanks to the fol- 
lowing property: 

Proposition 2. The connected components ofC(X) are in bijection with schedulings 
in X. 

Example 5. Consider the program p = q\q\q where q = P a -V a . The associated trace 
space X p is a cube minus a cube (as shown in Example 8). The matrices in C(X P ) are 

(1 0) (0 1 0) (0 1) (0 1 1) (1 1) (1 1 0) 

and they are all (transitively) connected. For instance, (0 1 l) A (l l) = (0 l) . The 
program p thus has exactly one total scheduling, as expected. 

Intuitively, alive matrices describe sets of dihomotopic total paths (Proposition 1) 
and the fact that two matrices have non-zero rows in their intersection means that there 
are paths which satisfy the constraints imposed by both matrices, i.e. the two matrices 
describe the same dihomotopy class of total paths. 

2.3 Computing dihomotopy classes 

The computation of the dihomotopy classes of total paths in the geometric semantics X 
of a given program will be performed in three steps: 

1. we compute the set T>(X) of dead matrices, 

2. we use V(X) to compute the index poset C(X), 

3. we deduce the homotopy classes of total paths by quotienting C(X) by the connex- 
ity relation. 

These steps are detailed below. 

Given a subset / of [0 : l[ and an index j S [0 : n[, we write yj — min{y* / i E 1} 
(by convention y® = oo). Given a matrix M e Mi, n , we define the set of non-zero 
rows of M by R{M) = {i e [0 : l[ / 3j £ [0 : n[, M(i, j) ^ 0}. It can be shown that a 
matrix M is dead if and only if the space Xm contains a deadlock. From the characteri- 
zation of deadlocks in geometric semantics given in [1 1], the following characterization 
of dead matrices can therefore be deduced: 



Proposition 3. A matrix M e Mf n is in V(X) iff it satisfies 

V(i, j) e [0 : Z[x [0 : n[, j) = 1 => x) < yf M) (4) 



Example 6. In the example below with / = 2 and n = 2, the matrix M 
dead (we suppose that x % j = 1 + i(j + 1) and = 3 + i(j + 1) — j): 

*i 
2/i 
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x 1 / - 1< 2 = 
4 = 2 < 3 = y { 
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{04} 



Vo Vo 
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The above proposition enables us to compute the set of dead matrices, for instance 
by enumerating all matrices and checking whether they satisfy condition 4 (a more 
efficient method is described in Section 2.4). From this set, the index poset C(X) can 
be determined using the following property: 

Proposition 4. A matrix M <E A4i tTl is not in C(X) iff there exists a matrix N S D(X) 
such that N M. In other words, M € C(X) iff for every matrix N € D(X) there 
exists indexes i e [0 : l[ and j £ [0 : n[ such that M(i, j) = and N(i,j) = 1. 

Notice that the poset C(X) is downward closed (because ^ is order preserving) 
and one is naturally interested in the subset C nmx (X) of maximal matrices in order to 
describe it. Proposition 4 provides a simple-minded algorithm for computing (maximal) 
matrices in C(X). We write V(X) = {Do, ■ • , -Dp-i}- We then compute the sets Ck of 
maximal matrices M such that for every i G [0 : k[ we have D t £ M. We start from the 
set Co = {1} where 1 is the matrix containing only 1 as coefficients. Given a matrix M, 
we write M~~^'^ for the matrix obtained from M by replacing the (i, j)-th coefficient 
by 1 — M(i,j). The set Ck+i is then computed from Ck by doing the following for all 
matrices M e C k such that D k < M: 

1. remove M from C k , 

2. for every (i, j) such that D k (i, j) = 1, 

- remove every matrix N e C' k such that TV < M"^ J \ 

- if there exists no matrix N e C k such that M^ 1 ^ sC AT, add ApW) to C fe . 

The set C max (X) is obtained as C p . If we remove the second point and replace it by 

2'. for every (i, j) such that D k (i, j) = 1 and M^ 1 ^ € Mf n , add M^'^ to C fe . 

we compute a set C p such that C max (X) C C p C C(X), which is enough to compute 
connected components and has proved faster to compute in practice. 



Example 7. Consider again Example 3. The algorithm starts with 



Co = {m>=(;;)} 

For C\, we must have D ^ M so we swap any of the two ones in the first row: 




Similarly for C2, we have to swap the bits on the second row so that D\ ^ Mc 




Finally, we have D 2 ^ M i7 excepting D 2 < M 5 , so we swap the bits in position (1, 1) 
and in position (2, 2): 

M'=(l°^M 3 M>i=(l°^M e 

Since we are only interested in maximal matrices, we end up with C 3 = {M 6 , M 4 , M 3 }. 
The trace spaces corresponding to those matrices are the three first depicted in (2). None 
of those matrices being connected, the trace space up to dihomotopy consists of exactly 
3 distinct points. 

Other implementations of the algorithm can be obtained by reformulating the com- 
putation of C max (X) as finding a minimal transversal in a hypergraph, for which effi- 
cient algorithms have been proposed [21]. 

We have supposed up to now that the forbidden region was a union of rectangles R l , 
each such rectangle being a product of open intervals P- =]x*., [. The algorithm given 
above can easily be generalized to the case where the rectangles R l can "touch the 
boundary" in some dimensions, i.e. the intervals are either of the form ]x*-, or 
[0, Uj[ or]x*-, 00] or [0, 00]. For example, the process P a .V a \P a .V a \P a .V a , with K a = 1, 
generates such a forbidden region. We write B e A4i, n for the boundary matrix, which 
is the matrix such that B(i,j) = whenever = (i.e. the i-th interval touches the 
lowest boundary in dimension j) and B(i, j) = 1 otherwise. The matrices of V{X) are 
the matrices M e M. n ,i of the form M — N A B, for some matrix N € which 
satisfy (4) and such that 

Vj e C(M), yf {M) = 00 (5) 

where C(M) is the set of indexes of null columns of M. 
2.4 An efficient implementation 

In order to compute the set T>(X) of dead matrices, the general idea is to enumerate all 
the matrices M G Mf n and check whether they satisfy the condition (4). Of course, a 
direct implementation of this idea would be highly inefficient since there are l n matrices 
in Mf n . In order to improve this, we try to detect "as soon as possible" when a matrix 



let rec compute_dead j m rows yrows = 
if j = n then dead := m :: Idead else 
for i = to I — 1 do 
try 

let changedjrows = not (Set.mem i rows) in 
let rows = Set.add i rows in 
let m = Array.copy m in 

if bounds(i,j) = 1 then m.(j) 4-None else m.(j) <— Some i; 
(match m.(j) with 

I Somei — >it x) > yrows. (j) then raise Exit 

I None — ¥ if yrows.(j) 7^00 then raise Exit); 
let yrows = 
let j'=jin 

if not changedjrows then yrows else 
Array. mapi (fun j yrj — >■ 

if yrj < then yrj else 
match rn.(j) with 
I None — > 

if j < j' && y]=/=oo then raise Exit; ?/j 
I Some i — >■ 

if > y] then raise Exit; j/] 

) yrows 

in 

compute_dead (j + 1) m rows yrows 
with Exit -»■ () 
done 

Fig. 1. Algorithm for computing dead matrices. 

does not satisfy the condition: we first fix the coefficient in the first column of M and 
check whether it is possible for a matrix with this first column to be dead, then we fix 
the second column and so on. In fact, we have to check that every coefficient 
such that M{i,j) = 1 satisfies < yf^ M \ Now, suppose that we know some of the 
coefficients for which M(i,j) = 1. We therefore know a subset / C R(M) of 
the non-zero rows. If for one of these coefficients we have > yj, we know that the 

matrix cannot satisfy the condition (4) because > yj > yf^ ■ A similar reasoning 
can be held for condition (5). 

The actual function computing the dead matrices is presented in Figure 1, in pseudo- 
OCaml code. This recursive function fills j-th column of the matrix M (whose columns 
with index below j are supposed to be already fixed) and performs the check: it tries 
to set the z-th coefficient to 1 (and all the others to 0) for every ie [0 : l[. If a matrix 
beginning as M (up to the j-th column) cannot be dead, the computation is aborted 
by raising the Exit exception. When all the columns have been computed the matrix is 
added to the list dead of dead matrices. Since a matrix M G M.f n has at most one 
non-null coefficient in a given column, it will be coded as an array of length n whose 
j-th element is either None when all the elements of the j-th column are null, or Some i 
when the i-th coefficient of the j-th column is 1 and the others are 0. The argument 
rows is the set of indexes of known non-null rows of M and yrows is an array of 



length n such that yrows.(j)= y r j OWS . The matrix bounds is the matrix previously 
noted B used to perform the check (5). Notice that the algorithm takes advantage of the 
fact that when the coefficient i chosen for the j-th column is already in rows (i.e. when 
the variable changed_rows is false) then many computations can be spared because 
the coefficients y T ° ws are not changed. 

Once the set of dead matrices computed, the set C(X) of alive matrices is then com- 
puted using the naive algorithm of Section 2.3, exemplified in Example 7. We have also 
implemented a simple hypergraph transversal algorithm [2] but it did not bring signifi- 
cant improvements, more elaborate algorithms might give better results though. Finally, 
the representatives of traces are computed as the connected components (in the sense of 
Proposition 2) of C(X), in a straightforward way. An explicit sequence of instructions 
corresponding to every representative M can easily be computed: it corresponds to the 
sequence of instructions crossed by any increasing total path in the d-space X M . 

2.5 An example: the n dining philosophers 

In order to illustrate the performances of our algorithm, we present below the compu- 
tation times for the well-known n dining philosophers program [9] whose schedulings 
are in 0(2"), hence is pushing any algorithm that would determine the essential sched- 
ules to its (exponential) limits. It is constituted of n processes pk in parallel, using n 
mutexes dj, defined by pk = P ak -Pa k+1 -V ak -V ak+1 , where the indexes on mutexes a* 
are taken modulo n. Such a program generates 2™ — 2 distinct schedulings, which our 
program finds correctly. The table below summarizes the execution time and memory 
consumption for our tool ALCOOL (programmed in OCaml), as well as for the model 
checker SPIN [1] implementing partial order reduction techniques. Whereas SPIN is 
not significantly slower, it consumes much more memory and starts to use swap from 
n = 12 (thus failing to give an answer in a reasonable time for n > 12). Notice that 
the implementation of SPIN is finely tuned and also benefits from gcc optimizations, 
whereas there is room for many improvements in ALCOOL. In particular, most of the 
time is spent in computing dead matrices and the algorithm of Section 2.4 could be im- 
proved by finding a heuristic to suitably sort holes so that failures to satisfy condition (4) 
are detected earlier. The present algorithm is also significantly faster than some of the 
author's previous contribution [16]: for instance, it was unable to generate these max- 
imal dipaths because of memory requirements, for n philosophers with n > 8 (in the 
benchmarks of [16], it was taking already 13739s, on a 1GHz laptop computer though, 
to generate just the component category for 9 philosophers). 
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Since the size of the output is generally exponential in the size of the input, there is 
no hope to find an algorithm which has less than an exponential worst-case complexity 
(which our algorithm clearly has). However, since our goal is to program actual tools 
to very concurrent programs, practical improvements in the execution time or memory 



consumption are really interesting from this point of view. We have of course tried 
our tool on many more examples, which confirm the improvement trend, and shall be 
presented in a longer version of the article. 

3 Programs with loops 
3.1 Paths in deloopings 

One of the most challenging part of verifying concurrent programs consists in verifying 
programs with loops since those contain a priori an infinite number of possible execu- 
tion traces. We extend here the previous methodology and, given a program containing 
loops, we compute a (finite!) automaton whose accepted paths describe the schedulings 
of the program: this automaton, can thus be considered as a control flow graph of the 
concurrent program. Of course, we are then able to use the traditional methods in static 
analysis, such as abstract interpretation, to study the program (this is briefly presented 
in Section 3.5). This section builds on some ideas being currently developed by Fa- 
jstrup [10], however most of the properties presented in this section are entirely new. To 
the best of our knowledge, this is the first works in which geometric methods are used in 
order devise a practical algorithm to handle programs containing loops. A particularly 
interesting feature of our method lies in the fact that it consider the broad "geometry 
of holes" and can thus associate a small control flow graph to a given program, see 
Section 3.4. 

In the following, we suppose fixed a program of the form p = Po\pi \ ■ ■ ■ \p n -i as 
in (1), with n threads. We write 



for the associated "looping program". Our goal in this section is to describe the schedul- 
ings of such a program p* (the restriction on the form of the programs considered here 
was only done to simplify our presentation and the methodology can be extended to 
handle all well-bracketed programs generated by the grammar, without any essential 
technical difficulty added). Following Section 1.2, its geometrical semantics consists of 
an n-dimensional torus with rectangular holes. As previously, for simplicity, we sup- 
pose that these holes do not intersect the boundaries, i.e. that p satisfies the hypothesis 
of Section 2.1. Given an n-dimensional vector v = (vo, . . . ,v n -i) with coefficients 
in N, the v-delooping of p, written p v , is the program p" Q a Ip^ 1 \ . . . \p^~i, where p v - 3 
denotes the concatenation of Vj copies of pj . A scheduling in p is a scheduling in the 
previous sense (i.e. a total path modulo homotopy) in p v for some vector v. 

Example 8. Consider the program p = q\q\q of Example 5, where q = P a .V a . Its 
geometric realization X p is pictured on the left, and its (3, 2, 2)-delooping X (3,2,2) is 
pictured on the right. , 



P* = Po\ P*i 







Given two spaces X and Y which are hypercubes with holes (which is the case for 
the geometric realizations of the programs we are considering here), we write X (Bj Y 
for the space obtained by identifying the j-th target face of the hypercube X with the 
j-th source face of the hypercube Y, and call it the j-gluing of X and Y. Formally, 
this can be defined as in Section 1 .2 as X (B j Y = 1^7/ <~, where the relation <~ 
identifies points x e X and y e F such that Xj = oo, yj = and xj> = yy for every 
dimension j' ^ j, and directed paths are defined in a similar fashion. Notice that, by 
definition, there is a canonical embedding of X (resp. Y) into X (Bj Y, which will allow 
us to implicitly consider X (resp. Y) as a subspace of X (Bj Y in the following. 

Example 9. The (3, 2, 2)-delooping of Example 8 is 

X p ,3,2,2 } = (Y 0i Y) 2 (Y 0i Y) with Y = X p O X p O X p 

More generally, any w-delooping p v of a program p of the form (1) can be obtained by 
gluing copies X p of X p , indexed by a vector w such that for every dimension i with 
^ i < n, we have < Wi < Vi (what we will simply write < w < v). 

Given two scheduling matrices M and N encoding extensions of holes of such a 
program p (cf. Section 2.2), we reuse the notation and write M (Bj N for the obvious 
matrix coding extension of holes in the space X p (Bj X p . At this point, it is crucial to 
notice that the holes described by N in the second copy of X p can have an effect on 
the first copy of X p (when they are extended to in the direction j), what we call the 
j-shadow of N, and write X N \. . 

Example 10. With the program p of Example 8, consider the matrices M = (10 0) 

and N = (0 1). We have M 0o N — q > ^ e s P ace Xm(b n is pictured on the 
left, and the 0-shadow X N \ of N is pictured on the right: 




The above example makes clear that the space corresponding to a scheduling M(BjN 
is of the form Xm^^n = {Xm H X n \.) ®j Xjy, i.e. the holes in the first copy come 
either from M or from shadows of N. Moreover, the holes in the space X N \. are hy- 
percubes which are products of intervals of the form Ilo^i<n where each interval Ij 
is of the form )xj , j/j [or [0, [or [0, oo], with ^ i < I. The shadows can therefore 
be coded as matrices (using a slightly different coding from the one used up to now, 
the precise way they are coded being quite irrelevant) and we write N\j for the matrix 
coding the j-shadow of n, which can easily be computed from TV and j. A schedul- 
ing matrix M can obviously be seen as a particular "shadow", enabling us to use the 
same notation for both, and we write M U N for the union of two shadows M and N, 
so that Xmun — Xm n Xpj. Finally, given a shadow M, the algorithm described in 
Section 2.3 can easily be adapted to the new coding in order to determine whether the 
space Xm is alive. 



3.2 The shadow automaton 



The trace space of a program p* is not finite in the general case. We show here that it 
can however be described as the set of paths of an automaton that we call the shadow 
automaton: this automaton provides us with a finite presentation of the set of schedul- 
es. 

Consider the u-delooping p v of a program p. The space X p v consists of the gluing of 
copies of X p indexed by vectors w such that < w < v and similarly, a scheduling M 
of X p v consists of the gluing of matrices M w . Clearly, if some submatrix M w is dead 
then the whole matrix M is dead: 

Lemma 1. If a matrix M is alive then all its submatrices M w are alive. 

However, the converse is not true because a scheduling M w might create a deadlock 
with the shadows coming from matrices above it. For instance in Example 8, the ma- 
trix M = (1 0) ©o (0 1 1) is not alive because the space X M (o,o,o) induced by the 
submatrix M^ 0,0 ' ) is contained in the space X N , where TV = (1 1 1) is a dead matrix: 

h 



> t 

In order to generate all the possible schedulings M w visited by a total path in X p v , 
we therefore have to take in account the shadows dropped by scheduling of copies of X p 
in its future. We will construct an automaton which will consider the visited schedulings 
of the path, starting from the end, and maintains the shadow they produce on the next 
state in a given direction j, so that we can compute the possible previous matrices in 
direction j such that the whole matrix is not dead. Formally, 

Definition 6. The shadow automaton of a program p is a non-deterministic automaton 
whose 

- states are shadows 

- transitions jyi are labeled by a direction j (with ^ j < n) and a 
scheduling M 

defined as the smallest automaton 

- containing the empty scheduling 

- and such that for every state N', for every direction j and for every scheduling M 
such that the scheduling M U N' is alive, and M is maximal with this property, 
there is a transition jy i' M > jyi with N — (M U N')\j. 

All the states of the automaton are both initial and final. 

Example 11. Consider the program p = q\q with q = P a .V a whose geometric seman- 
tics is a square with a square hole. The associated shadow automaton is 



.0-- 



For instance the transition >■ I is computed as follows: we take the shadow 

M — L?_U L?_ = L?_ and compute its shadow in direction 0, i.e. on the left, to compute 
the source of the transition. This shadow is namely: . 

The interest of the automaton lies in the fact that fully describes the possible schedul- 
ings crossed by a total path in a scheduling of a delooping X p v : 

Theorem 1. Suppose that M is a scheduling of X p v, obtained by gluing schedul- 
ings M w of X p . Then there exists a total path in Xm going through the subspaces 
Xm™o , Xmvh , . . . , Xm™™ in this order, such that W}~ and iffc+i only differ by one co- 
ordinate jk (i.e. the path exits from XM w k through its jk-th face), if and only if there 
exists a path labeled as follows in the shadow automaton: 



N 



No 



N„ 



N 



m+l 



for some states Ni and dimension j. 



Example 12. With the program p of Example 11, the following paths in the (2, 2) -de- 
looping t t 




are respectively witnessed by the following paths of the shadow automaton: 
3.3 Reducing the size of the shadow automaton 

The size of the shadow automaton grows very quickly when the complexity of the trace 
space grows. For instance, for the program p of Example 8, the shadow automaton 
has already 19 states and 80 transitions. We describe here some ways to reduce the 
automaton while preserving Theorem 1 . Namely, we should remark that the automaton 
is not minimal in the following sense. By Proposition 1, given a scheduling M two total 
paths Xm are necessarily homotopic: an alive scheduling thus describes an homotopy 
class of total paths. By Theorem 1, the schedulings "visited" by a total path in X p v are 



described by a path in the shadow automaton, therefore every homotopy class of total 
paths in X p v is described by at least one path in the scheduling automaton. The shadow 
automaton is not minimal in the sense that generally, an homotopy class is described by 
more than one path in the scheduling automaton. 

Determinization. First, our non-deterministic automaton can be determinized using 
classical algorithms of automata theory, which in practice greatly reduce their size: 
the determinized automaton for the program of Example 8 has only 4 states and 24 
transitions. 

Example 13. The determinized automata for Examples 1 1 and 8 are respectively: 



_,Mi 




_,Mi 



// l.M 0,M 2 
W ^ 1,M 2 >J 




.,M 2 



with 



M = 



Mi = * t eBL 



M 2 = 



where "_" means any direction j. The state / is initial and all the states are final. 

Quotient under connexity. A way to further reduce the automaton consists in quoti- 
enting the scheduling matrices labeling the arrows of the automaton under the connexity 
relation of Definition 5 before determinizing the automaton, which is formally justified 
by Proposition 2. 



Example 14. The shadow automaton corresponding to the program Example 8 quo- 
tiented under connexity, determinized and minimized is simply the automaton / ^J) _,m 
where M = Mi = M 2 = M 3 up to connexity (the matrices Mi are those defined in 
Example 13). 



We are currently investigating further conditions in order to construct the minimal 
automaton describing the trace space associated to a looping program, but the condi- 
tions mentioned above are already providing us with promisingly small automata. 

3.4 Preliminary implementation and benchmark 

A preliminary implementation of the computation of the shadow automaton was done. 
The algorithm implemented is currently quite simple, but we plan to generalize the 
algorithm of Section 2.4 soon, which is not complicated from a theoretical point of 
view but much more involved technically, in order to achieve better performances. Most 
experiments lead so far are already promising and make it clear that taking in account 
the geometry of the state-space enables us to reduce, sometimes drastically, the size of 
the control flow graph corresponding to the program to be analyzed. 



Example 15. The two-phase locking protocol is a simple discipline for distributed data- 
bases, in which the processes first lock all the mutexes for the resources they are going 
to use and free all of them in the end [20]. This can be modeled as a program q n l 

consisting of n copies of the process p = P ai P ai .V ai V ai in parallel (each of 

these process is using I resources). For instance, the geometric semantics of g 2 ,2 = p\p 
is shown below. Notice that this state space is equivalent to a space with only one hole 
up to dihomotopy. More generally, given I ^ 1, it can be shown that the geometric 
semantics of q n j is equivalent to q n> i, which our algorithm is able to take into ac- 
count! Namely, the size of the shadow automaton associated to l only depends on n 
whereas the number of states of the automaton produced by SPIN is exponential in I 
(with n fixed). Below are presented the size (states, transitions) of the non-deterministic 
automaton (s, t), determinized automaton (s',t') and SPIN'S automaton (sspin, *spin) 
for the two-phase locking process described in Example 15, for some values of n and I. 
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3.5 An Application to static analysis 

Now that we have the reduced shadow automaton, we can explain how one can perform 
static analysis by abstract interpretation [5] on concurrent systems, in an economic 
way. The systematic design and proof of correctness of such abstract analysis is left for 
a future article, the aim of this section is to give an intuition why the computations of 
Section 3 are relevant to static analysis by abstract interpretation. The idea is to asso- 
ciate, to each node n of the shadow automaton, a set of values A n that program variables 
can take if computation follows a transition path whose last vertex is n. Among the ac- 
tions the program can take along this scheduling, we consider only the greedy ones, 
that is the ones which execute all possible actions permitted by the dihomotopy class of 
schedulings ending by n. 

Suppose that we want to analyze the program 



P* = (P a .(a:=a-l).V a y\(p a .(a:=^yV a y 



(6) 



What are the possible sets of values reached, for a, starting with a e [0, 1]? The as- 
sociated shadow automaton S p has been determined in Example 13 (this automaton is 
reduced) together with relations, that we will not be using in this article, yet. In many 
ways, this reduced shadow automaton plays the role of a compact control flow graph 

for the program we are analyzing. Calling M = and M x = L^_, X Mo has the effect 
on environment: a := a/2 and Xm 1 has as effect: a := a — 1. 

We are now in a position to interpret the arrows of the shadow automaton as simple 
abstract transfer functions and produce a system of equations for which we want to 



determine a least-fixed point, to get the invariant of the program at the (multi-)control 
point which is the pair of the heads of the loops of each process. The interpretation 
on the shadow automaton now gives (ignoring the initial state I in that picture, for 
simplicity's sake) can be graphically pictured as: 

[o:=o-i] o cr [a:=5 !^ 1 [a:= 51 

V — [a~a—l\ ~~ — / 

Given the abstract transfer functions on each edge of the shadow automaton, we pro- 
duce as customary the abstract semantic equations, one per node, by joining all transfer 
functions correspond to ingoing edges to that node: 

(A \ (AA (IU(A -1)U(A 1 -1)\ 

This set of semantic equations can be seen as a least-fixed point equation, that we can 
solve using any of our favorite tool, for instance Kleene iteration and widening/nar- 
rowing, on any abstract domain, such as the domain of intervals as in the example 
below. The least-fixed point formulation that we are looking for is thus A 00 = V[o l] F, 
where F is the function defined in (7) and I = [0, 1]. A Kleene iteration on this mono- 
tonic function F on the lattice of intervals over R reveals that A^ = A^° =] — oo, 1]. 

We have presented this example in order to show how the reduced shadow automa- 
ton can be used in order to use usual static analysis methods on concurrent programs, 
avoiding state-space explosion as much as possible. It has the advantage of being short, 
however it does not really show the main interest of our technique: the scheduling au- 
tomaton allows us to take in account properties which tightly depend on the way the 
synchronizations constraint the executions of the programs. 

4 Conclusion and Future work 

We have presented an algorithm in order to compute a finite presentation of the trace 
space of concurrent programs, which may contain loops. An application to abstract in- 
terpretation has also described but remains to be implemented. In order to give a simple 
presentation of the algorithm, we have restricted ourselves here to programs of a simple 
form (in particular, we have omitted non-determinism). We shall extend our algorithm 
to more realistic programming languages in a subsequent article. Our approach can also 
be applied to languages with other synchronization primitives (monitors, send/recv, 
etc.), for which there are simple geometric semantics available. There are also many 
possible general improvements of the algorithm; the most appealing one would per- 
haps be to find a way to have a more modular way of computing the total schedulings 
by combining locally computed schedulings in T(X)(x,y) with varying endpoints x 
and y. In a near future, the schedulings provided by the algorithm will be used by our 
tool ALCOOL to analyze concurrent programs using abstract interpretation, thus pro- 
viding one of the first tools able to do such a static analysis on concurrent programs 
without forgetting most of the possible synchronizations during their execution. 

On the theoretical side, we envisage to study in details and use the structure of the 
index poset C(X) which contains much more information than only the schedulings 
of the program. Namely, it can be equipped with a structure of prodsimplicial set [22] 



(a structure similar to simplicial sets but whose elements are products of simplexes), 
whose geometric realization provides a topological space which is homotopy equiva- 
lent to the trace space T(X) [24]. This essentially means that C(X) contains all the 
geometry of the trace space and we plan to try to benefit from all the information it pro- 
vides about the possible computations of a program. Our ALCOOL prototype actually 
implements this computation - using a combinatorial presentation of the prodsimplicial 
sets known as simploidal sets [23] - which will be reported elsewhere. 
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